144 research outputs found

    Cleaning Genotype Data from Diversity Outbred Mice.

    Get PDF
    Data cleaning is an important first step in most statistical analyses, including efforts to map the genetic loci that contribute to variation in quantitative traits. Here we illustrate approaches to quality control and cleaning of array-based genotyping data for multiparent populations (experimental crosses derived from more than two founder strains), using MegaMUGA array data from a set of 291 Diversity Outbred (DO) mice. Our approach employs data visualizations that can reveal problems at the level of individual mice or with individual SNP markers. We find that the proportion of missing genotypes for each mouse is an effective indicator of sample quality. We use microarray probe intensities for SNPs on the X and Y chromosomes to confirm the sex of each mouse, and we use the proportion of matching SNP genotypes between pairs of mice to detect sample duplicates. We use a hidden Markov model (HMM) reconstruction of the founder haplotype mosaic across each mouse genome to estimate the number of crossovers and to identify potential genotyping errors. To evaluate marker quality, we find that missing data and genotyping error rates are the most effective diagnostics. We also examine the SNP genotype frequencies with markers grouped according to their minor allele frequency in the founder strains. For markers with high apparent error rates, a scatterplot of the allele-specific probe intensities can reveal the underlying cause of incorrect genotype calls. The decision to include or exclude low-quality samples can have a significant impact on the mapping results for a given study. We find that the impact of low-quality markers on a given study is often minimal, but reporting problematic markers can improve the utility of the genotyping array across many studies

    Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.

    Get PDF
    We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects

    Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence.

    Get PDF
    Recent developments allowed generating multiple high-quality \u27omics\u27 data that could increase the predictive performance of genomic prediction for phenotypes and genetic merit in animals and plants. Here, we have assessed the performance of parametric and nonparametric models that leverage transcriptomics in genomic prediction for 13 complex traits recorded in 478 animals from an outbred mouse population. Parametric models were implemented using the best linear unbiased prediction, while nonparametric models were implemented using the gradient boosting machine algorithm. We also propose a new model named GTCBLUP that aims to remove between-omics-layer covariance from predictors, whereas its counterpart GTBLUP does not do that. While gradient boosting machine models captured more phenotypic variation, their predictive performance did not exceed the best linear unbiased prediction models for most traits. Models leveraging gene transcripts captured higher proportions of the phenotypic variance for almost all traits when these were measured closer to the moment of measuring gene transcripts in the liver. In most cases, the combination of layers was not able to outperform the best single-omics models to predict phenotypes. Using only gene transcripts, the gradient boosting machine model was able to outperform best linear unbiased prediction for most traits except body weight, but the same pattern was not observed when using both single nucleotide polymorphism genotypes and gene transcripts. Although the GTCBLUP model was not able to produce the most accurate phenotypic predictions, it showed the highest accuracies for breeding values for 9 out of 13 traits. We recommend using the GTBLUP model for prediction of phenotypes and using the GTCBLUP for prediction of breeding values

    A General Bayesian Approach to Analyzing Diallel Crosses of Inbred Strains

    Get PDF
    The classic diallel takes a set of parents and produces offspring from all possible mating pairs. Phenotype values among the offspring can then be related back to their respective parentage. When the parents are diploid, sexed, and inbred, the diallel can characterize aggregate effects of genetic background on a phenotype, revealing effects of strain dosage, heterosis, parent of origin, epistasis, and sex-specific versions thereof. However, its analysis is traditionally intricate, unforgiving of unplanned missing information, and highly sensitive to imbalance, making the diallel unapproachable to many geneticists. Nonetheless, imbalanced and incomplete diallels arise frequently, albeit unintentionally, as by-products of larger-scale experiments that collect F1 data, for example, pilot studies or multiparent breeding efforts such as the Collaborative Cross or the Arabidopsis MAGIC lines. We present a general Bayesian model for analyzing diallel data on dioecious diploid inbred strains that cleanly decomposes the observed patterns of variation into biologically intuitive components, simultaneously models and accommodates outliers, and provides shrinkage estimates of effects that automatically incorporate uncertainty due to imbalance, missing data, and small sample size. We further present a model selection procedure for weighing evidence for or against the inclusion of those components in a predictive model. We evaluate our method through simulation and apply it to incomplete diallel data on the founders and F1's of the Collaborative Cross, robustly characterizing the genetic architecture of 48 phenotypes

    Drug safety Africa: An overview of safety pharmacology & toxicology in South Africa.

    Get PDF
    This meeting report is based on presentations given at the first Drug Safety Africa Meeting in Potchefstroom, South Africa from November 20-22, 2018 at the North-West University campus. There were 134 attendees (including 26 speakers and 34 students) from the pharmaceutical industry, academia, regulatory agencies as well as 6 exhibitors. These meeting proceedings are designed to inform the content that was presented in terms of Safety Pharmacology (SP) and Toxicology methods and models that are used by the pharmaceutical industry to characterize the safety profile of novel small chemical or biological molecules. The first part of this report includes an overview of the core battery studies defined by cardiovascular, central nervous system (CNS) and respiratory studies. Approaches to evaluating drug effects on the renal and gastrointestinal systems and murine phenotyping were also discussed. Subsequently, toxicological approaches were presented including standard strategies and options for early identification and characterization of risks associated with a novel therapeutic, the types of toxicology studies conducted and relevance to risk assessment supporting first-in-human (FIH) clinical trials and target organ toxicity. Biopharmaceutical development and principles of immunotoxicology were discussed as well as emerging technologies. An additional poster session was held that included 18 posters on advanced studies and topics by South African researchers, postgraduate students and postdoctoral fellows

    High-Resolution Genetic Mapping Using the Mouse Diversity Outbred Population

    Get PDF
    The JAX Diversity Outbred population is a new mouse resource derived from partially inbred Collaborative Cross strains and maintained by randomized outcrossing. As such, it segregates the same allelic variants as the Collaborative Cross but embeds these in a distinct population architecture in which each animal has a high degree of heterozygosity and carries a unique combination of alleles. Phenotypic diversity is striking and often divergent from phenotypes seen in the founder strains of the Collaborative Cross. Allele frequencies and recombination density in early generations of Diversity Outbred mice are consistent with expectations based on simulations of the mating design. We describe analytical methods for genetic mapping using this resource and demonstrate the power and high mapping resolution achieved with this population by mapping a serum cholesterol trait to a 2-Mb region on chromosome 3 containing only 11 genes. Analysis of the estimated allele effects in conjunction with complete genome sequence data of the founder strains reduced the pool of candidate polymorphisms to seven SNPs, five of which are located in an intergenic region upstream of the Foxo1 gene

    Telomere Length Shows No Association with BRCA1 and BRCA2 Mutation Status

    Get PDF
    This study aimed to determine whether telomere length (TL) is a marker of cancer risk or genetic status amongst two cohorts of BRCA1 and BRCA2 mutation carriers and controls. The first group was a prospective set of 665 male BRCA1/2 mutation carriers and controls (mean age 53 years), all healthy at time of enrolment and blood donation, 21 of whom have developed prostate cancer whilst on study. The second group consisted of 283 female BRCA1/2 mutation carriers and controls (mean age 48 years), half of whom had been diagnosed with breast cancer prior to enrolment. TL was quantified by qPCR from DNA extracted from peripheral blood lymphocytes. Weighted and unweighted Cox regressions and linear regression analyses were used to assess whether TL was associated with BRCA1/2 mutation status or cancer risk. We found no evidence for association between developing cancer or being a BRCA1 or BRCA2 mutation carrier and telomere length. It is the first study investigating TL in a cohort of genetically predisposed males and although TL and BRCA status was previously studied in females our results don't support the previous finding of association between hereditary breast cancer and shorter TL
    • …
    corecore